Tree Induction vs. Logistic Regression: A Learning-Curve Analysis
نویسندگان
چکیده
Tree induction and logistic regression are two standard o the shelf methods for building models for classi cation We present a large scale experimental comparison of logistic regression and tree induction assessing classi cation ac curacy and the quality of rankings based on class membership probabilities We use a learning curve analysis to examine the relationship of these measures to the size of the training set The results of the study show several remarkable things Contrary to prior observations logistic regression does not generally outperform tree induction More speci cally and not surprisingly logistic regression is better for smaller training sets and tree induction for larger data sets Importantly this often holds for training sets drawn from the same do main i e the learning curves cross so conclusions about induction algorithm superiority on a given domain must be based on an analysis of the learning curves Contrary to conventional wisdom tree induction is e ective at pro ducing probability based rankings although apparently comparatively less so for a given training set size than at making classi cations Finally the do mains on which tree induction and logistic regression are ultimately preferable can be characterized surprisingly well by a simple measure of signal to noise ratio
منابع مشابه
Comparison of Gestational Diabetes Prediction Between Logistic Regression, Discriminant Analysis, Decision Tree and Artificial Neural Network Models
Background and Objectives: Gestational Diabetes Mellitus (GDM) is the most common metabolic disorder in pregnancy. In case of early detection, some of its complications can be prevented. The aim of this study was to investigate early prediction of GDM by logistic regression (LR), discriminant analysis (DA), decision tree (DT) and perceptron artificial neural network (ANN) and to compare these m...
متن کاملPrediction of Fault-Prone Software Modules using Statistical and Machine Learning Methods
Demand for producing quality software has rapidly increased during the last few years. This is leading to increase in development of machine learning methods for exploring data sets, which can be used in constructing models for predicting quality attributes such as fault proneness, maintenance effort, testing effort, productivity and reliability. This paper examines and compares logistic regres...
متن کاملComparing the Results of Logistic Regression Model and Classification and Regression Tree Analysis in Determining Prognostic Factors for Coronary Artery Disease in Mashhad, Iran
Background and purpose: Understanding of the risk factors for cardiovascular artery disease, which is the leading cause of death worldwide, can lead to essential changes in its etiology, prevalence, and treatment. The aim of this study was to compare the results of logistic regression model and Classification and Regression Tree Analysis (CART) in determining the prognostic factors for coronary...
متن کاملModels to predict cardiovascular risk: comparison of CART, multilayer perceptron and logistic regression
The estimate of a multivariate risk is now required in guidelines for cardiovascular prevention. Limitations of existing statistical risk models lead to explore machine-learning methods. This study evaluates the implementation and performance of a decision tree (CART) and a multilayer perceptron (MLP) to predict cardiovascular risk from real data. The study population was randomly splitted in a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 4 شماره
صفحات -
تاریخ انتشار 2003